Search Result

Select

Unlabeled network pruning algorithm based on Bayesian optimization

GAO Yuanyuan, YU Zhenhua, DU Fang, SONG Lijuan

Journal of Computer Applications 2023, 43 (1): 30-36. DOI: 10.11772/j.issn.1001-9081.2021112020

Abstract （363）

HTML （39）

PDF （1391KB）（136）

Save

To deal with too many parameters and too much computation in Deep Neural Networks （DNNs）， an unlabeled neural network pruning algorithm based on Bayesian optimization was proposed. Firstly， based on a global pruning strategy， the sub-optimal compression ratio of the model caused by layer-by-layer pruning was avoided effectively. Secondly， the pruning process was independent on the labels of data samples， and the compression ratios of all layers were optimized by minimizing the distance between the output features of pruning and baseline networks. Finally， the Bayesian optimization algorithm was adopted to find the optimal compression ratio of each layer， thereby improving the efficiency and accuracy of sub-network search. Experimental results show that when compressing VGG-16 network by the proposed algorithm on CIFAR-10 dataset， the parameter compression ratio is 85.32%， and the Floating Point of Operations （FLOPS） compression ratio is 69.20% with only 0.43% accuracy loss. Therefore， the DNN model can be compressed effectively by the proposed algorithm， and the compressed model can still maintain good accuracy.

Reference | Related Articles | Metrics

Select

Case reading comprehension method combining syntactic guidance and character attention mechanism

HE Zhenghai, XIAN Yantuan, WANG Meng, YU Zhengtao

Journal of Computer Applications 2021, 41 (8): 2427-2431. DOI: 10.11772/j.issn.1001-9081.2020101568

Abstract （493）

PDF （813KB）（567）

Save

Case reading comprehension is the specific application of machine reading comprehension in judicial field. Case reading comprehension is one of the important applications of judicial intelligence, which reads the judgment documents by computer and answers the related questions. At present, the mainstream method of machine reading comprehension is to use deep learning model to encode the text words and obtain vector representation of the text. The core problem of model construction is how to obtain the semantic representation of the text and how to match the questions with the context. Considering that syntactic information is helpful for model learning the sentence skeleton information and Chinese characters have potential semantic information, a case reading comprehension method that integrates syntactic guidance and character attention mechanism was proposed. By fusing the syntactic information and Chinese character information, the coding ability of the model for the case text was improved. Experimental results on the reading comprehension dataset of Law Research Cup 2019 show that compared with the baseline model, the proposed method has the Exact Match (EM) value increased by 0.816 and the F1 value improved by 1.809%.

Reference | Related Articles | Metrics

Select

Chinese-Vietnamese pseudo-parallel corpus generation based on monolingual language model

JIA Chengxun, LAI Hua, YU Zhengtao, WEN Yonghua, YU Zhiqiang

Journal of Computer Applications 2021, 41 (6): 1652-1658. DOI: 10.11772/j.issn.1001-9081.2020071017

Abstract （332）

PDF （1333KB）（304）

Save

Neural machine translation achieves good translation results on resource-rich languages, but due to data scarcity, it performs poorly on low-resource language pairs such as Chinese-Vietnamese. At present, one of the most effective ways to alleviate this problem is to use existing resources to generate pseudo-parallel data. Considering the availability of monolingual data, based on the back-translation method, firstly the language model trained by a large amount of monolingual data was fused with the neural machine translation model. Then, the language features were integrated into the language model in the back-translation process to generate more standardized and better quality pseudo-parallel data. Finally, the generated corpus was added to the original small-scale corpus to train the final translation model. Experimental results on the Chinese-Vietnamese translation tasks show that compared with the ordinary back-translation methods, the Chinese-Vietnamese neural machine translation has the BiLingual Evaluation Understudy (BLEU) value improved by 1.41 percentage points by fusing the pseudo-parallel data generated by the language model.

Reference | Related Articles | Metrics

Select

Chinese-Vietnamese news topic discovery method based on cross-language neural topic model

YANG Weiya, YU Zhengtao, GAO Shengxiang, SONG Ran

Journal of Computer Applications 2021, 41 (10): 2879-2884. DOI: 10.11772/j.issn.1001-9081.2020122054

Abstract （321）

PDF （758KB）（189）

Save

In Chinese-Vietnamese cross-language news topic discovery task, the Chinese-Vietnamese parallel corpora are rare, it is difficult to train high-quality bilingual word embedding, and the news text is generally long, so that the method of bilingual word embedding is difficult to represent the text well. In order to solve the problems, a Chinese-Vietnamese news topic discovery method based on Cross-Language Neural Topic Model (CL-NTM) was proposed. In the method, the news topic information was used to represent news text, and the bilingual semantic alignment was converted into bilingual topic alignment tasks. Firstly, the neural topic models based on the variational autoencoder were trained in Chinese and Vietnamese respectively to obtain the monolingual abstract representations of the topics. Then, a small-scale parallel corpus was used to map the bilingual topics into the same semantic space. Finally, the K-means method was used to cluster the bilingual topic representations for finding the topics of news event clusters. Experimental results show that, compared with the Improved Chinese-English Latent Dirichlet Allocation model (ICE-LDA), the proposed method increases the Macro-F1 value and topic-coherence by 4 percentage points and 7 percentage points respectively, showing that the proposed method can effectively improve the clustering effect and topic interpretability of news topics.

Reference | Related Articles | Metrics

Select

Chinese-Vietnamese bilingual multi-document news opinion sentence recognition based on sentence association graph

WANG Jian, TANG Shan, HUANG Yuxin, YU Zhengtao

Journal of Computer Applications 2020, 40 (10): 2845-2849. DOI: 10.11772/j.issn.1001-9081.2020020280

Abstract （351）

PDF （815KB）（400）

Save

The traditional opinion sentence recognition tasks mainly realize the classification by emotional features inside the sentence. In the task of cross-lingual multi-document opinion sentence recognition, the certain supporting function for opinion sentence recognition was provided by the association between sentences in different languages and documents. Therefore, a Chinese-Vietnamese bilingual multi-document news opinion sentence recognition method was proposed by combining Bi-directional Long Short Term Memory (Bi-LSTM) network framework and sentence association features. Firstly, emotional elements and event elements were extracted from the Chinese-Vietnamese bilingual sentences to construct the sentence association diagram, and the sentence association features were obtained by using TextRank algorithm. Secondly, the Chinese and Vietnamese news texts were encoded in the same semantic space based on the bilingual word embedding and Bi-LSTM. Finally, the opinion sentence recognition was realized by jointly considering the sentence coding features and semantic features. The theoretical analysis and simulation results show that integrating sentence association diagram can effectively improve the precision of multi-document opinion sentence recognition.

Reference | Related Articles | Metrics

Select

Clustering by fast search and find of density peaks based on spectrum analysis

HAN Zhonghua, BI Kaiyuan, SI Wen, LYU Zhe

Journal of Computer Applications 2019, 39 (2): 409-413. DOI: 10.11772/j.issn.1001-9081.2018061381

Abstract （394）

PDF （869KB）（257）

Save

For different clustering effects of Clustering by Fast Search and Find of Density Peaks (CFSFDP) on different datasets, an improved CFSFDP algorithm based on spectral clustering was proposed, namely CFSFDP-SA (CFSFDP based on Spectrum Analysis). Firstly, a high-dimensional non-linear dataset was mapped into a low-dimensional subspace to realize dimension reduction, then the clustering problem was transformed into the optimal partitioning problem of the graph to enhance the algorithm adaptability to the global structure of the data. Secondly, the CFSFDP algorithm was used to cluster the processed dataset. Combining the advantages of these two clustering algorithms, the clustering performance was further improved. The clustering results of two artificial linear datasets, three artificial nonlinear datasets and four real datasets in UCI show that compared with CFSFDP, the CFSFDP-SA algorithm has higher clustering precision, achieving up to 14% improvement in accuracy for high-dimensional dataset, which means CFSFDP-SA is more adaptable to the original datasets.

Reference | Related Articles | Metrics

Select

Improved particle swarm optimization algorithm based on twice search

ZHAO Yanlong, HUA Nan, YU Zhenhua

Journal of Computer Applications 2017, 37 (9): 2541-2546. DOI: 10.11772/j.issn.1001-9081.2017.09.2541

Abstract （544）

PDF （908KB）（475）

Save

Aiming at the premature convergence problem of standard Particle Swarm Optimization (PSO) in solving complex optimization problem, a new search PSO algorithm based on gradient descent method was proposed. Firstly, when the global extremum exceeds the preset maximum number of unchanged iterations, the global extremum was judged to be in the extreme trap. Then, the gradient descent method was used to proceed twice search, a tabu area was constituted with the center of optimal extremum point and the radius of specific length to prevent particles repeatedly search the same area. Finally, new particles were generated based on the population diversity criteria to replace the particles that would be eliminated. The twice search algorithm and other four improved algorithms were applied to the optimization of four typical test functions. The simulation results show that the convergence accuracy of the twice search particle swarm algorithm is higher up to 10 orders of magnitude, the convergence speed is faster and it is easier to find the global optimal solution.

Reference | Related Articles | Metrics

Select

SMFCC: a novel feature extraction method for speech signal

WANG Haibin, YU Zhengtao, MAO Cunli, GUO Jianyi

Journal of Computer Applications 2016, 36 (6): 1735-1740. DOI: 10.11772/j.issn.1001-9081.2016.06.1735

Abstract （695）

PDF （874KB）（389）

Save

Aiming at the problems of effective feature extraction of speech signal and influence of noise in speaker recognition, a novel method called Mel Frequency Cepstral Coefficients based on S-transform (SMFCC) was proposed for speech feature extraction. The speech features were obtained which were based on traditional Mel Frequency Cepstral Coefficients (MFCC), employed the properties of two-dimensional Time-Frequency (TF) multiresolution in S-transform and effective denoising of two-dimensional TF matrix with Singular Value Decomposition (SVD) algorithm, and combined with other related statistic methods. Based on the TIMIT corpus, the extracted features were compared with the current features by the experiment. The Equal Error Rate (EER) and Minimum Detection Cost Function (MinDCF) of SMFCC were smaller than those of Linear Prediction Cepstral Coefficient (LPCC), MFCC, and LMFCC; especially, the EER and MinDCF08 of SMFCC were decreased by 3.6% and 17.9% respectively compared to MFCC.The experimental results show that the proposed method can eliminate the noise in the speech signal effectively and improve local speech signal feature resolution.

Reference | Related Articles | Metrics

Select

Recognition of Chinese news event correlation based on grey relational analysis

LIU Panpan, HONG Xudong, GUO Jianyi, YU Zhengtao, WEN Yonghua, CHEN Wei

Journal of Computer Applications 2016, 36 (2): 408-413. DOI: 10.11772/j.issn.1001-9081.2016.02.0408

Abstract （411）

PDF （895KB）（884）

Save

Concerning the low accuracy of identifying relevant Chinese events, a correlation recognition algorithm for Chinese news events based on Grey Relational Analysis (GRA) was proposed, which is a multiple factor analysis method. Firstly, three factors that affect the event correlation, including co-occurrence of triggers, shared nouns between events and the similarity of the event sentences, were proposed through analyzing the characteristics of Chinese news events. Secondly, the three factors were quantified and the influence weights of them were calculated. Finally, GRA was used to combine the three factors, and the GRA model between events was established to realize event correlation recognition. The experimental results show that the three factors for event correlation recognition are effective, and compared with the method only using one influence factor, the proposed algorithm improves the accuracy of event correlation recognition.

Reference | Related Articles | Metrics

Select

Security scheme of XML database service using improved polyphonic splitting

YANG Gang CHEN Yue HUANG Huixin YU Zhe

Journal of Computer Applications 2013, 33 (06): 1637-1641. DOI: 10.3724/SP.J.1087.2013.01637

Abstract （713）

PDF （775KB）（575）

Save

Outsourcing data owner’s data to Database Services Provider (DSP) securely provides XML database service for companies and organizations, which is an important data service form in cloud computing. This paper proposed an improved polyphonic splitting scheme for XML database service(IPSS-XML). IPSS-XML overcame the drawback of low verifying efficiency in other existing schemes by adding an Assistant Verifying Data (AVD) to each non-leaf node at low cost. The improvement enhances query executing efficiency without breaking the confidentiality constraints.

Reference | Related Articles | Metrics

Select

Application of active learning to recommender system in communication network

CHEN Ke-jia HAN Jing-yu ZHENG Zheng-zhong ZHANG Hai-jin

Journal of Computer Applications 2012, 32 (11): 3038-3041. DOI: 10.3724/SP.J.1087.2012.03038

Abstract （1373）

PDF （630KB）（440）

Save

The existence of potential links in sparse networks becomes a big challenge for link prediction. The paper introduced active learning into the link prediction task in order to mine the potential information of a large number of unconnected node pairs in networks. The most uncertain ones of the unlabeled examples to the system were selected and then labeled by the users. These examples would give the system a higher information gain. The experimental results in a real communication network dataset Nodobo show that the proposed method using active learning improves the accuracy of predicting potential contacts for communication users.

Reference | Related Articles | Metrics

Select

Color image segmentation of multi-resolutin Markov random field in combination with multi-space characteristics

YANG Hua-yong YU Zheng-hong ZHENG Chen

Journal of Computer Applications 2011, 31 (12): 3378-3381.

Abstract （1272）

PDF （638KB）（543）

Save

This paper proposed a new Multi-Space Multi-Resolutin Markov Random Field Model (MS-MRMRF). Concerning the inadequate description of the color images in a single RGB space, the proposed model firstly transformed images from the RGB color space to the HSV color space and combined these two color spaces as a multi-space feature; then a new multi-resolution Markov model was designed to segment the image based on the multi-space feature, which estimated the parameters by fuzzy theory. The experiments of the color images demonstrate that the segmentation results of MS-MRMRF model have a higher segmentation accuracy compared with the segmentation results of multi-resolution MRF with a single RGB space.